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Summary - 

We consider a permutation method for testing whether observations given in their natural pairing 
exhibit an unusual level of similarity in situations where any two observations may be similar at 
some unknown baseline level. Under a null hypotheses where there is no distinguished pairing of 
the observations, a normal approximation with explicit bounds and rates is presented for determining 
approximate critical test levels. 
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1 Introduction 

The work in this paper was motivated by examples such as the following. 

Example 1. Schiffman et. al (1978), with statistical assistance by one of the 
authors 1^, studied the influence of a doctor's prior probabilities of diseases on di- 
agnosis. Statistical thinking, which can be formalized in Bayesian terms, suggests 
that given a set of symptoms, a doctor's diagnosis or ranking of possible diagnoses 
should be influenced not only by the symptoms, but also by the disease prevalence 
at the time of diagnosis. Doctors' information on prevalence may come, for exam- 
ple, from textbooks, articles, and personal experience. The goal of the study was 
to verify the influence of personal prior information or opinion on disease preva- 
lence (henceforth referred to as "personal prior") on diagnosis and help determine 
whether doctors need to be better educated to take prevalence into account, or if 
providing them with information on prevalence at the time of diagnosis is useful. 

In this study each doctor in a sample produced first a ranking X of the preva- 
lence, or of the probability of various diseases from a given list; such a ranking 
represents the doctor's personal prior. A compatible medical scenario was then 
presented to all doctors, and each one of them produced a ranked list Y of possible 
diagnoses from the same given list. Rank correlations between X and Y for each 
doctor were then computed. To test the hypotheses that a doctor's personal prior 
does not influence his diagnostic rankings, a null hypotheses of zero correlation be- 
tween each doctor's X and Y is not appropriate. Even with no such influence, one 
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would expect that pairs of rankings would have some nonzero baseline correlation 
due to the influence of other factors like common medical knowledge. The null hy- 
pothesis of interest that there is no influence of personal prior is complex since the 
baseline correlation is unknown. The presence of an unknown baseline correlation 
raises the question of how high the within-doctor rank correlations need to be to 
reject the null hypothesis and assert the claim that there is influence of personal 
prior on diagnostic rankings. 

Correlations are used here as a measure of similarity between ranked lists. 
Henceforth we will talk about similarity in general, and the approach appUes to 
any measure of similarity or proximity defined on the sample space. 
The main focus of this paper is on examples of the following kind: 
Example 2. This example is somewhat artificial, but it is simpler and can clar- 
ify the issue; it will also help in explaining the example that follows it which is 
rather similar. Consider an instructor who wants to know if students are copying 
from their neighbors in a class where students take an exam while seated in pairs. 
Given a measure of similarity between exams, we expect any two exams to be simi- 
lar even in the absence of copying. Common knowledge that all students hopefully 
have would make their exams similar to a certain, unknown degree. Therefore, we 
want to test if the similarity between seated pairs is unusual (due to copying) rel- 
ative to some unknown baseline similarity. This example is different from the 
first in that here a similarity score can be computed for any pair of exams X.i, Xj, 
whereas in the first example the correlations of interest are those between X and 
Y. 

Example 3. Situations similar to Example 2 arise naturally in environmental 
and medical studies, where subjects in a given study group are matched (paired) by 
certain common background of interest, such as having lived in the same neighbor- 
hood during a given period, having certain common medical conditions or having 
certain variables in common (e.g., gender, age, weight, etc.). In order to assess 
the influence of the background in question on a given set of certain medical con- 
ditions (denoted by Xj for subject i), one should test whether matched pairs are 
more similar than unmatched ones relative to the medical condition being studied. 
The baseline similarity between unmatched pairs is again unknown, but a certain 
degree of similarity must certainly exist due to common factors that all subject in 
the particular study might have. More specifically, suppose we have an even num- 
ber n of subjects and those indexed by 2i — 1 and 2i form the matched pairs for 
i = 1, . . . , n/2, and let Xi measure subject i's medical condition. Our goal is to 
test whether all X2i-i and Xi, which arise from the matched pairs, exhibit more 
similarity then Xi and Xj from unmatched pairs. 
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A related testing problem arises in the design of studies involving matching of 
subjects that are similar by some background criteria in order to reduce variability 
in other variables of interest. The matching process often requires a great effort. 
The question of whether it achieves its purpose in producing a higher level of 
similarity in the variables of interest than would be achieved at random, can be 
tested as described in this paper. 

Example 1 is a specific instance of a problem of the following type. Con- 
sider pairs of observations {Xi, Yi), . . . , {X^, Yn), where Xi and Yi take values 
in a space so that a proximity function c{X, Y) is defined. This function may 
sometimes be obtained as a decreasing function of some metric. However, for the 
rankings of Example 1 , the rank correlation is a relevant proximity function not 
derived from a metric. We want to test whether the natural pairing of Xi to 
exhibits a significantly higher level of proximity or similarity than an unknown 
baseline level. The null hypotheses that the level of proximity or similarity in the 
natural pairing is the same as the baseline level can be formulated as the hypothesis 
that the observations [{Xi, 1^7^(1)), . . . , y^(„))] are identically distributed for 
all TT G Sn, the permutation group of n elements. Conditioning on the observed 
{eij = c{Xi, Yj)}, a permutation test which compares the value of 



for the special permutation vr = id (the identity), against critical values of the 
distribution of U-j^ when vr is uniform over Sn (the distribution assigning equal 
probabilities to every element in Sn), can be used to test the null hypothesis. 

The permutation distribution of Ut^ for vr uniform over Sn was studied in nu- 
merous other statistical contexts. For a seminal reference which contains both the- 
ory and applications see Wald and Wolfowitz (1944). More recent articles which 
in turn contain further references include the following. In connection with linear 
rank statistics. Ho and Chen (1978) and Bolthausen (1984) computed bounds on 
the rate of convergence to normality. Bickel and van Zwet (1978) give more back- 
ground and results on linear rank statistics for two-sample problems, including an 
Edgeworth expansion for a special case of Q. Diaconis, Graham and Holmes 
(1999) discuss similar statistics and also some subsets of permutations related to 
tests for independence. Kolchin and Chistyakov (1973) discuss the permutation 
distribution for the subset of permutations with one cycle. Below we discuss a 
rather different subset of permutations, in which the number of cycles is maximal. 
For general theory on permutation tests see, e.g, Pesarin (2001) and references 
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therein. Related work on normal approximations can be also be found in Stein 
(1986), whose ideas and methods have strongly influenced us and other authors. 
We will give a brief indication of some basic ideas of Stein's method. 

In this paper we focus on situations as described in Examples 2 and 3 where 
all pairings can be compared. We consider the following framework. Given an 
even number n of paired observations (Xi, X2), (X3, X4), . . . , X„), with 

values in a space so that a proximity function c{Xi , Xj ) is defined, we want to 
test whether the special pairing of X2i-i with X2i exhibits a significantly higher 
level of similarity than an unknown baseline level. The null hypotheses that the 
similarity level of the special pairing is the same as the baseline level is here for- 
mulated as the hypothesis that the variables [(Xj, ^^^(j)), i < vr(z)] are identically 
distributed for all vr G n„ where 

Hn = {vr G 5„ : vr^ = id, 7r{i) ^ i for all i}. 

The condition vr^ = id reflects the fact that if i is paired with j then j is paired 
with i, and the condition 7r(i) / i the fact that no i can be paired with itself. The 
special pairing which we suspect may show a high similarity level corresponds to 
the permutation fr G n„ specified by the conditions Tx{2i — \)=2i and (vr)^ = id. 
Conditioning on the set of values {eij = c(Xj, Xj)} we consider the permutation 
test which compares the value of [7,^ at the special permutation vr = tt against 
critical values of the distribution of U-,^ when vr is uniform over n„. 

The proposed two tests discussed above appear similar, as in both tests the 
observed similarity related to a special permutation is compared to critical values 
computed against a null distribution induced by the uniform distribution over a 
space of relevant permutations. For the first test that space is Sn and the special 
permutation is the identity (which matches Xi with Yi), and for the second test 
the space of permutations is n„ and the special permutation is vf (which matches 
-^^22-1 with Xi). We henceforth discuss only the second case and study the permu- 
tation distribution relative to n„. The methods used here apply to the permutation 
distribution over the whole of Sn mutatis mutandis. 

For the null hypothesis to be true it is sufficient that the X's are exchangeable, 
but the null hypothesis is complex and does not specify the distribution of U-,^ nor 
the baseline similarity. In the absence of a null distribution, the above permutation 
test seems very natural. 

We shall provide a normal approximation to the permutation distribution of 
[/vr of O including bounds, rates, and explicit constants in order to determine 
approximate critical values for the permutation test. 
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Henceforth we suppress the dependence of C/^ in O on vr. Furthermore, for 
values gij with ga = 0, we set 

n n n ^ 

9i+ = ^9ij, 9+j = '^9ij, 9++=^9ij, and g^:=--—jgi+. 

j=l i=l i,j=l 

Note that the terms 64,^(2) and e^(j)j always appear together in the sum U, and we 
may therefore assume without loss of generality that Sij = eji. The diagonal terms 
en never enter U and we take them to be 0. Given such a collection of numbers 
Cij, define 



d- =l ("-2) (n-2) ^ (n-l){n-2) (2) 

\ i=j. 

Bounds to the normal approximation for the permutation distribution of U are 
contained in the following theorem. For convenience we assume without further 
comment that n > 10. 

Theorem 1 Let U be given by 0, vr be uniform over n„ and 

W = — , a = max|(iij — dki\, and 5 = sup <w) — ^{w)\, 

^JVar{U) w€R 

where $ is the standard normal distribution function. Then 

EU = e++/in-l), 



(n-l)(n-3) 
and there exist constants c\ , C2 such that 



/ " 1 

(n-2) ^e?, + ^, 

\ i,k=l 



n 



i=l 



5 < cin 



If, for example, the constants dij are bounded then a is bounded and ^ • df 



0{n^), so in view of the bound above decays at the rate of Var([/) 
n~i/2 Below a somewhat crude calculation gives the upper bounds of ci < 
86, C2 < 243. 
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2 Proof of Theorem 1 

We compute the mean and vaiiance of U in Section ITTl and establish the upper 
bound on the normal approximation in Section IZ21 

2.1 Mean and Variance of U 

To compute the mean and variance of U = X^ILi ^i7r(i)' where vr is chosen uni- 
formly from n„, we have the following Lemma. 

Lemma 1 Let gij satisfy ga = and set 

f _ j 9ij -9i+ ^ / i 
- \ i = j. 

Then with 

n 

^ = 9iTT{i) 

i=l 

we have 

n 

Egin{i) = 9i+ and therefore EV = ^-g-]^ 

4=1 

and 

"■-"^'= (n-l)(n-3) (p"-^> E 4+ E 

^ ' \ l{i,i}l=2 |{i,i}|=2 

Proof: Since 7r(i) can be any j ^ i with probability l/(n — 1), we have 

1 ^-^ 1 

Egi^[i) - -^—^ 2^ Qij - ^—^ gi+ - 9i+, 



and so 



n 

Var(y) = Var^/i,(,) 

1=1 

n 

= ^(/i7r(i)/i7r(i)) 
«=1 I{»J}I=2 



- \{i,j}\=2 \{i,j}\=2 
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Now note that the probabiUty is l/(n — 1) that 7r(i) = j, and therefore 
that 7r(j) = i. If 7r(i) / j, then i, j,7r(i),7r{j) are all distinct, and given any 
\{i,j, k,l}\ =4 the probability that 7r(i) = k and 7r(j) = / is l/[(n — l)(n — 3)]. 
We therefore have 

E Yl f^Ai)fMj) = Yl f^J + (n - l)(n - 3) ^ ^"^^ 

\{i,j}\=2 \{i,j}\=2 ^ ' \{i,jAl}\=i 

The first equality below follows by summing over I fc}, and using 
fjj = and /j+ = 0, and the second in a similar way by summing over j ^ {i,k}; 

^ ^ fikfjl — ^ ^ fik{~fji ~ fjk) 

\{i,j,k,l}\=4. |{ij,fc}|=3 

= ^ fikifki + fik) 

\{i,k}\=2 

= ^ (/ife/fcj + fik) ■ 
\{i,k}\=2 

The formula for Var(y) now follows by collecting terms. □ 

Writing for the moment Ud and Ue for the values of U based on dij and eij 
respectively, we have 



n — 1 

In order to see the above relation between Ud and Ue, sum ^ over i with i 7^ j, 
and use symmetry to yield e+j = ej+, and obtain 

d+j = e+j - [e++ - ej+]/(n - 2) - (n - l)e+j/(n - 2) + e++/(n - 2) = 0, 
so that 

di+ = d+j = d++ = 0. (5) 

Since the distribution of Ud is a simple translation of that for Ue we study Ud = U; 
henceforth we suppress the d. 

Applying Lemma ^ with gij = dij, since di+ = we have fij = dij and 
therefore 

EU = 0; 



using also dij = dji. 



(n- l)(n-3) ^ 
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In terms of the symmetric but otherwise arbitrary values Cij which may not 
satisfy ©, the variance in ^ is obtained by substituting ^ into 

2.2 Normal Approximation Upper Bound 

We apply the following theorem, which is a special case of (1.10) of Theorem 1.2 
of Rinott and Rotar (1997), when R = 0, using (1.12). The latter is based on 
Stein's method (Stein 1986, pg 35), with an improvement on the rates under some 
condition. 

Theorem 2 Let {W, W*) be exchangeable with EW = and EW"^ = 1 such 
that for < A < 1 we have 

E{W*\W) = {l-\)W. (7) 

// 

- VF| < A (8) 

for a constant A, then 

6 = sup \P{W <w)- ^{w)\ 

< ^4VVar{E[{W* - Wy\W]} + ./^ ^^^ + f^^^' . (9) 
A V vr A 

We briefly indicate the idea behind the proof of a theorem of this type. This dis- 
cussion can serve as some introduction to Stein's method for the interested reader, 
but it is not necessary for the rest of the paper. 

First note that a random variable W has the standard normal distribution if and 
only if 

Ef'{W) = EWf{W) (10) 

holds for all continuous and piecewise continuously differentiable functions /, for 
which the expectations in (flOl exist. This motivates the differential equation (fT2l 
in the lemma below. 

Set = Eh{Z), where Z is standard normal, and /i is a function for which 
the expectation exists. Also, for a real valued h, let \ \h\ \ denote the sup norm, that 
is, \ \h\ \ = sup^ \ h{x)\. The lemma below is elementary, though the bounds in (fT3l 
require some calculations. 

Lemma 2 Let h be a bounded piecewise continuously differentiable real valued 
function. The function 

/•w 

f(w) = e'"'/^ / [/j(^) _ $/i]e-^'/2j2, (11) 
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solves the (first order linear) differential equation 



f\w) - wf{w) = h{w) - (12) 

and 

(a) ||/||<^/2^||/i||, (b) ll/ll <2||/i||, (c) ||r||<2||/i'||. (13) 
Now, exchangeability of (W, W*) and © directly imply 



E{wfiw)} = mw*-wmw*)-fiw)]} ^ 

Together with (fT2l) this imphes 

Ek(W) - ^ Ef(W) - ^t'"" - "^"^l^"' - ^<'^>l' . (14) 

2 A 

The first term in the Taylor expansion of f{W*) - f{W) is (W* - W)f{W), 
and the r.h.s. of d is bounded by ^E{f'{W)[2X - (W* - Wf]} plus a re- 
mainder term which we now ignore. By Q we have E(W* — W)"^ = 2A, and (IT3] 
b) and the Cauchy Schwarz inequality readily yield 



E{f'{W)[2\ - [W* - Wf]} < 2||/i||yVar{^[(Ty* - W^)2|VF]}. 

Using (I14t . an approximation of an indicator function of a half line by a smooth 
h yields a term similar to the first term on the r.h.s. of and calculation of the 
remainder in the above Taylor expansion yields the second. To obtain the precise 
bound a certain induction and further calculations are needed. □ 

We shall apply Theorem |2l to = U /a, but for convenience we first de- 
scribe the coupling and compute the relevant quantities in terms of U . Given a 
permutation vr chosen uniformly from n„ construct the permutation vr* in n„ by 
choosing I, J distinct and uniformly, and imposing 7r*(/) = J (and therefore 
7r*(J) = /), and 7r*(7r(I)) = 7r(J) (and therefore 7r*(7r(J)) = vr(I)) and fixing 
the values of 7r*(A;) = 7r(A;) for k ^ {I , J,tt{I),tt{J)}. With U = Y.idi-K(i}, let 

To verify ©, first note that 

U*-U = 2(d/j + 4(^),(j)-(d^,(^) + dj,(j))), (15) 
where the factor 2 accounts for the symmetry dij = dji. 
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Letting C be the event that J / 7r(/), we have {U* - U) = {U* - U)lc and 
therefore 

E{{U* - U)\U) = E{{U* - U)lc\U). 

For the first two terms in (fTSb . recalling d-^-+ = 0, and using that (/, J) is 
independent of vr and equals any of the n{n — 1) pairs for which i ^ j, 



n{n 

-1 



n{n — 1) 

and similarly for the term d,^[i)n{j)^ as (7r(/), 7r( J)) has the same distribution as 
{I, J). 

Now consider the third term on the right hand side in dlSt : 

E{di^{i)lc\'^) = , ^ V di7r{i)l(j / 7r(i)) 

^ ' |{ij}|=2 

^ n ^ n 

= —, TV c?i7r(i) y] l(j / = — — V di^i^ V 1 

n-2 -A n-2 
n(n — l)'^^ ^' n(n — 1) 

i = l 

By symmetry the same is true for the term dj^^^jy 

Collecting terms and using J^{U) C ^(vr), where J^{X) denotes the sigma 
field generated by the random variable X, we have 

E{U*-U\U) = . (2 + 2{n-2))U = --U. 

n[n — I) n 

Thus ^ holds with A = 4/n. 

Now we consider the first term in the bound in Theorem |2l since J^{U) C 

Var{^[(;7* - Uf\U]} < Nav{E[{U* - Uf\^]}. (16) 
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From (fTSt . 



E{{U*-Uf\Tr) = 4i?([(d7j + d,(,),(j))-((i,,(,)+dj,(j))]V)-(17) 

When we expand the square we get the following types of terms; (i) the square 
terms from the first group of parentheses, (ii) mixed terms formed by taking one 
term from the first group with one term from the second, (iii) the square terms 
from the second group, (iv) mixed terms between values in the first group, and (v) 
mixed terms between values in the second group. 

(i) The value of the conditional expectation for the square term E(fi|j|7r) 
clearly does not depend on vr, and therefore contributes a constant value which 
does not affect the variance. The same is true for -^(0^ W) because as (/, J) 
range over all possible distinct pairs with equal probability so do (7r(I), 7r( J)). 

(ii) Terms such as E{dijdj^^-^\7r), evaluate to zero. In this particular case take 
expectation over J first and use di+ = 0. 

By tallying the contributions from terms (iii),(iv), and (v), we conclude that, 
up to an additive constant not depending on vr, and therefore not affecting the 
variance, dTTl equals 



*=1 ^ ' \l{i,i}l=2 \{i,j}\=2 



We may write ([TSjl as 8 (^41 + ^2 + ^3) where 
1 " 1 

i=i ^ ^ \{i,j}\=2 

^ ' |{»J}|=2 

In view of (IT6l . we now need to compute the variance of (fTSl with respect to a 
uniform vr e n„. We have 

Var(8(^i + A2 + ^3)) < 8^ • 3 (Var(Ai) + Var(A2) + Var(^3)) . 

To calculate Var(Ai), apply Lemma^with gij = df^ to obtain 



^ ' ^ |{ij}|=2 |{i,i}|=2 
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For the second term above, by Cauchy-Schwarz, 



I E f^^f^^^J E E 4= E 4- (19) 

|{»J}|=2 ^ \{i,j}\=2 |{m}|=2 |{m}|=2 

Collecting terms we conclude 

^ ' \{i,j}\=^ I0j}|=2 

for n > 8. 

We now turn to Var(A2). With 

1 = {1 = {i,j,k,l,Tr{i),Tr{j),Tr{k),Tr{l)) : i ^ j,k ^ l,Tr e n„}, 

it can be shown that when tt is uniform over n„, the probabiUty of a given I G X 
satisfying |I| = s is 

P{l) = j^, sG {2,4,6,8}, where [n]^ = (n - l)(n - 3) • • • (n - s + 1). 
[n\s 

For I = (i, j, k, I, k' , V) G 1 set di = dijdkidi'j'dk'i'- We then have 
'^^( E < E dijdkiE{d^(^i-j^(^j)dT,(^k')T,(^i)) 

\{id}\=^ \{i,j}\=2\{k,l}\=2 

= EJ-^i= E t;^ E (20) 

lex ^ J 1^1 s€{2,4,6,8} ^ leX(s) 

where X(s) are all those I G X with |I| = s. 

Consider first the case of s = 8. Since = 0, summing over I' 

k, I, we have 

E ~ E ^ij^kldi'j'dkni 

iex(8) iex(8) 
= - E E dijdkidi'j'dk'i'. (21) 

fc, /, i' j', fc'}|=7 l'e{i,j, k, I, 

Applying Cauchy Schwarz to each of the six terms in the inner sum, the absolute 
value of the expression is bounded by 

6(n-2)5 E 4' 

\{iJ}\=2 
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where 

(n)<j = n(n — 1) • • • (n — s + 1). 
For s G {2, 4, 6} apply Cauchy Schwarz to 

d'ijdkidi'j'dk'i' 

lex(s) 

to obtain the bound 

(n-2),_2 5] 4. 

\{i,j}\=2 

Therefore 



4 



< 



4 



|{ij}|=2 



where the latter bound holds for n > 10 and follows by elementary calculations. 

Although ^3 and A2 are not identically distributed, it is easy to see that the 
variance of can be bounded in exactly the same manner. 

We obtain from (fTSb and the above discussion that 

\Br{E[iU* - U)^\U]} < (8^ . 3) ^^df^. (22) 
We now apply Theorem|2lto W = U/a, W* = U* /a. From Q we conclude 

that 

Var(f/)=a^>§ 4- 



n 

|{iJ}l=2 



It follows from (l22l . 



Var{£;[(l^*-l^)2|l^]}<^-|:^ dlf.(23) 

With 

a = max — dki\, 
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we have \ U* — U\ < Aa, and hence 



w*-w\ < -\u*-u\< — 



a a 



A. 



2E|{ij}|=2^^ij 



Applying Theorem|2lwith this ^4, A = 4/n and using expression d23t . we have 
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